Improving profile HMM discrimination by adapting transition probabilities.

نویسندگان

  • Markus Wistrand
  • Erik L L Sonnhammer
چکیده

Profile hidden Markov models (HMMs) are used to model protein families and for detecting evolutionary relationships between proteins. Such a profile HMM is typically constructed from a multiple alignment of a set of related sequences. Transition probability parameters in an HMM are used to model insertions and deletions in the alignment. We show here that taking into account unrelated sequences when estimating the transition probability parameters helps to construct more discriminative models for the global/local alignment mode. After normal HMM training, a simple heuristic is employed that adjusts the transition probabilities between match and delete states according to observed transitions in the training set relative to the unrelated (noise) set. The method is called adaptive transition probabilities (ATP) and is based on the HMMER package implementation. It was benchmarked in two remote homology tests based on the Pfam and the SCOP classifications. Compared to the HMMER default procedure, the rate of misclassification was reduced significantly in both tests and across all levels of error rate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transition Priors for Protein Hidden Markov Models: An Empirical Study towards Maximum Discrimination

Insertions and deletions in a profile hidden Markov model (HMM) are modeled by transition probabilities between insert, delete and match states. These are estimated by combining observed data and prior probabilities. The transition prior probabilities can be defined either ad hoc or by maximum likelihood (ML) estimation. We show that the choice of transition prior greatly affects the HMM's abil...

متن کامل

Hidden Markov Models for Remote Protein Homology Detection

Genome sequencing projects are advancing at a staggering pace and are daily producing large amounts of sequence data. However, the experimental characterization of the encoded genes and proteins is lagging far behind. Interpretation of genomic sequences therefore largely relies on computational algorithms and on transferring annotation from characterized proteins to related uncharacterized prot...

متن کامل

State-Transition Interpolation and MAP Adaptation for HMM-based Dysarthric Speech Recognition

This paper describes the results of our experiments in building speaker-adaptive recognizers for talkers with spastic dysarthria. We study two modifications – (a) MAP adaptation of speaker-independent systems trained on normal speech and, (b) using a transition probability matrix that is a linear interpolation between fully ergodic and (exclusively) leftto-right structures, for both speaker-dep...

متن کامل

Hidden Markov Models in Protein Modeling

The use of Hidden Markov Models (HMM) in protein modeling is described. Sequence alignment based on profile HMMs can help identifying protein family members and present some advantages. This possibility is discussed. Introduction. The functional and structural characterization of new proteins can be done by taking advantage of their evolutionary relation with proteins of known structure or func...

متن کامل

Recent Topics in Speech Recognition Research at NTT Laboratories

This paper introduces three recent topics in speech recognition research at NTT (Nippon Telegraph and Telephone) Human Interface Laboratories. The first topic is a new HMM (hidden Markov model) technique that uses VQ-code bigrams to constrain the output probability distribution of the model according to the VQ-codes of previons frames. The output probability distribution changes depending on th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of molecular biology

دوره 338 4  شماره 

صفحات  -

تاریخ انتشار 2004